This is the notebook for the Module 10 Programming Assignment.
Here are a few tips for using the iPython HTML notebook:
You should rename this notebook to be <your JHED id>_PR10.ipynb Do it right now.
Make certain the entire notebook executes before you submit it. (See Hint #5 above)
Change the following variables:
In [1]:
name = "Shehzad Qureshi"
jhed_id = "squresh6"
if name == "Student Name" or jhed_id == "sname1":
raise Exception( "Change the name and/or JHED ID...preferrably to yours.")
Add whatever additional imports you require here. Stick with the standard libraries and those required by the class. The import gives you access to these functions: http://ipython.org/ipython-doc/stable/api/generated/IPython.core.display.html (Copy this link) Which, among other things, will permit you to display HTML as the result of evaluated code (see HTML() or display_html()).
In [2]:
from IPython.core.display import *
from StringIO import StringIO
import numpy as np
In this programming assignment you will implement a feed forward neural network (multi-layer perceptron) and the back propagation algorithm. You only need one hidden layer but the number of input nodes, hidden nodes and output nodes should be configurable.
You will apply your neural network to a regression task using the accompanying file: neural_network_regression.csv. The function you're learning looks like this:
In [3]:
Image( filename="neural_network_regression.png")
Out[3]:
which is clearly not learnable as a simple linear regression.
You're going to have to try a few different numbers of nodes in the hidden layer. What you really should do is put the evaluation techniques we learned last week to good use but that might be a bit much to expect in one week. If you can do it, you should. Otherwise, guestimate.
For simplicity, you should assume that you have a train
function that takes parameters and spits out some representation of the neural network. That representation is entirely up to you.
The second function is output
. It should take your network representation, an instance and emit y_hat. If you are using test data, this means you need to supply only the "xs" part to the output function and then compare y_hat and y.
Don't forget that you need to add $x_0=1$ everywhere (the "biases") inside the network...every node really is just a mini-logistic regression. I've also started you out with some default parameter values. You may need to change them.
Normally, you'd have to apply mean normalization which would add a whole other layer of parameters to store in the network Dict but you don't have to here, because the values of x1, x2 and y are already on the interval (0, 1) so you don't have to worry about that for this assignment.
Let's start by defining what the network looks like. We need to keep track of the hidden and output theta values, therefore we need two lists.
Each list will have at least one theta value and a bias to start with. All values are initialized to some random number in the interval [0, 1)
In [4]:
def define_network(num_input, num_hidden, num_output):
hidden = np.random.random((num_hidden, num_input + 1))
output = np.random.random((num_output, num_hidden + 1))
return {"hidden" : hidden, "output" : output}
In [5]:
test_network = define_network(2, 2, 1)
test_network
Out[5]:
Let's add a way to prettify the network ooutput. At the very least make it a bit more readable.
In [6]:
def print_network(network):
print "\nOutput:"
for x, row in enumerate(network["output"]):
for y, col in enumerate(row):
print "T[{0}][{1}] = {2}".format(x, y, col)
print
print "Hidden:"
for x, row in enumerate(network["hidden"]):
for y, col in enumerate(row):
print "T[{0}][{1}] = {2}".format(x, y, col)
print
In [7]:
print_network(test_network)
We need to split the incoming data into input and output. We shall insert an initial bias of $1.0$ to use as $x_0$.
In [8]:
def split_data(dataset, num_inputs):
x, y = np.array_split(dataset, num_inputs, np.shape(dataset)[1] - num_inputs)
ones = np.ones((x.shape[0], 1))
x = np.column_stack((ones, x))
return x, y
In [9]:
dataset = np.loadtxt('neural_network_regression.csv', delimiter=',')
test_x, test_y = split_data(dataset, 2)
print test_x.shape, test_y.shape
This is our sigmoid function. It works on single values or arrays.
$\hat{y} = {1 \over {1 + e^-z}}$
Where $z$ is defined as:
$z = \sum{\theta_i \cdot x_i}$
In [10]:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
In [11]:
print sigmoid(0.1) # = 0.5249
print sigmoid(0.5) # 0.6224
print sigmoid(np.array([0.1, 0.5, -0.162, 0.2408]))
Now we need to calculate the output of the network, at the output nodes and the hidden nodes as well. We add a bias value of 1.0 to $y_i^H$ to complete the matrix calculation.
In [12]:
def calculate_output(network, x):
x1 = np.asmatrix(x)
y_hidden = sigmoid(np.dot(x1, network["hidden"].T))
y_hidden_bias = np.insert(y_hidden, 0, 1, axis=1)
y_output = sigmoid(np.dot(y_hidden_bias, network["output"].T))
return np.asarray(y_hidden), np.asarray(y_output)
In [13]:
test_y_h, test_y_o = calculate_output(test_network, test_x)
print test_y_h[0], test_y_o[0]
Now we need to calculate the error in the output. This can be done directly since $y$ is given.
$\delta^o = \hat y \cdot (1 - \hat y) \cdot (y - \hat y)$
where $\hat y$ is the calculated output and $y$ is the truth.
In [14]:
def calculate_delta_output(y, y_h):
d_o = y_h * (1 - y_h) * (y - y_h)
return np.asarray(d_o)
In [15]:
test_d_o = calculate_delta_output(test_y, test_y_o)
print test_d_o[0]
We also need to calculate the error in the hidden nodes as well. This is done by weighting the error for each node as seen in the output.
$\delta_i^H = \hat y_i \cdot (1 - \hat y_i) \cdot \sum {\theta_i^o \cdot \delta_i^o}$
where $\hat y_i$ is the output of each hidden node, $\theta_i^o$ is the theta value of each connection to the outputs, $\delta_i^o$ is the error from that output.
In [16]:
def calculate_delta_hidden(theta_o, y_h, d_o):
y = y_h.T
t = np.delete(theta_o, 0, axis=1).T
d_h = [y[i] * (1 - y[i]) * np.sum(t[i] * d_o) for i, _ in enumerate(y)]
return np.asarray(d_h).T
In [17]:
test_d_h = calculate_delta_hidden(test_network["hidden"], test_y_h, test_d_o)
test_d_h[0]
Out[17]:
This function will update our theta values for the network, given all necessary parameters. The update equations are:
$\theta_i^o = \theta_i^o + \delta_i^o \cdot y_i^H$
$\theta_i^H = \theta_i^H + \delta_i^H \cdot x_i$
In [18]:
def update_theta(network, d_o, d_h, y_h, x, alpha):
x1 = np.atleast_2d(x)
y_h_b = np.insert(np.atleast_2d(y_h), 0, 1, axis=1)
d_h_b = np.insert(np.atleast_2d(d_h), 0, 1, axis=1)
output = np.array([n + y_h_b * d_o * alpha for n in network["output"]])
hidden = np.array([n + x1 * d_h_b * alpha for n in network["hidden"]])
output = np.squeeze(output, axis=(1,))
hidden = np.squeeze(hidden, axis=(1,))
#return np.squeeze(np.array([hidden, output]))
return {"hidden" : hidden, "output" : output}
In [19]:
new_network = update_theta(test_network, test_d_o[0], test_d_h[0], test_y_h[0], test_x[0], 0.05)
print_network(new_network)
The train function takes in a dataset, a parameter set of number of input, hidden and output nodes, as well as epsilon (for convergence checking) and alpha (learning rate) values.
After defining the network and splitting the data as necessary, it performs the following operations:
The first two are vectorized using Numpy. The third cannot be vectorized since it is a recursive function dependent on the previous theta value.
In [20]:
def train(dataset, num_inputs, num_hidden, num_output, epsilon=0.0000001, alpha=0.05, debug=True):
network = define_network(num_inputs, num_hidden, num_output)
x, y = split_data(dataset, num_inputs)
previous_error, current_error = 0, 1
while np.abs(current_error - previous_error) > epsilon:
y_hidden, y_output = calculate_output(network, x)
delta_output = calculate_delta_output(y, y_output)
delta_hidden = calculate_delta_hidden(network["output"], y_hidden, delta_output)
for i, _ in enumerate(dataset):
network = update_theta(network, delta_output[i], delta_hidden[i], y_hidden[i], x[i], alpha)
previous_error, current_error = current_error, np.mean(y_output - y)
if debug: print "Error:", np.abs(previous_error - current_error)
return network
Now let's test our network with the given data. We can assume that $2^n / n = 2$ is a good start for the number of hidden nodes in the network.
In [21]:
dataset = np.loadtxt("neural_network_regression.csv", delimiter=',')
network = train(dataset, 2, 2, 1)
print_network(network)
The output function will take a neural network as defined above and an example and return the calculated output of the network. This will be used for validation.
In [22]:
def output(network, instance):
return calculate_output(network, instance)[1]
In [23]:
x, y, y_h = [], [], []
for i in xrange(5):
x.append(np.insert(dataset[i][:-1], 0, 1))
y.append(dataset[i][-1:])
for i in xrange(5):
y_h.extend(output(network, x[i]))
print "Error:", np.abs(np.subtract(y_h, y))
Now we need to evalute the model. We will do so using a 67/33 split for training and validation. We will do this with 2 hidden nodes.
In [24]:
def partition(dataset, fold, k):
segments = np.array_split(dataset, k)
test = segments[fold-1]
training = [segments[i] for i in xrange(k) if i != (fold-1)]
return np.concatenate(training), test
In [25]:
dataset = np.loadtxt('neural_network_regression.csv', delimiter=',')
error = []
for i in xrange(10):
np.random.shuffle(dataset)
training, validation = partition(dataset, 3, 3)
network = train(training, 2, 2, 1, debug=False)
x, y = split_data(validation, 2)
y_h = np.array([output(network, xval) for xval in x])
mse = np.average((y_h - y)**2)
error.append(mse)
print "Mean Square Error:", np.average(error)
Now we'll calculate the mean square error using 10-fold cross validtion.
In [26]:
def cross_validation(dataset, n, folds=10):
error = []
for i in xrange(1, folds+1):
training_set, test_set = partition(dataset, i, folds)
network = train(dataset, 2, n, 1, debug=False)
x, y = split_data(validation, 2)
y_h = np.array([output(network, xval) for xval in x])
error.append(np.average((y_h - y)**2))
return np.average(error)
In [27]:
dataset = np.loadtxt('neural_network_regression.csv', delimiter=',')
error = cross_validation(dataset, 2)
print "Average MSE:", error
Looks like the average error is roughly the same.
Summary
Artificial Neural Networks performs a similar function as Regression in the Machine Learning realm. It attemps to fit a line to the data by trying to recognize trends in data. Instead of adjusting the variables bit by bit and adjusting as necessary, Artificial Neural Networks work by back-propagating the error in the output layer to the hidden layers. The variables are then adjusting accordingly until convergence.
Artificial Neural Networks tend to be prone to overfitting though. It seems like it fits too perfectly and consequently the validation set has more error.
I managed to vectorize most of the calculations, including output and error calculations. Updating the theta required iteration because the function was non-linear, i.e. depended on the previous value of theta. I couldn't figure out how to get Numpy to do that although I'm not sure it's possible.
This particular assignment was a frustrating attempt to try and generalize matrix calculations. I did my best to generalize the formulas for calculating output and error but it seems like there was always a test case that would break the generalization. What worked on paper didn't necessary work well in code either.
I ended up sticking to a value of 2 for the number of hidden nodes which seems to work well enough. Suprisingly it converges very quickly so I question whether I'm actually doing it right.
In [ ]: